Solving Human Resources Issues¶
Preparation¶
Source of data¶
Employee Data
The data contains demographic details, work-related metrics and attrition flag.
- EmployeeId - Employee Identifier
- Attrition - Did the employee attrition? (0=no, 1=yes)
- Age - Age of the employee
- BusinessTravel - Travel commitments for the job
- DailyRate - Daily salary
- Department - Employee Department
- DistanceFromHome - Distance from work to home (in km)
- Education - 1-Below College, 2-College, 3-Bachelor, 4-Master, 5-Doctor
- EducationField - Field of Education
- EnvironmentSatisfaction - 1-Low, 2-Medium, 3-High, 4-Very High
- Gender - Employee's gender
- HourlyRate - Hourly salary
- JobInvolvement - 1-Low, 2-Medium, 3-High, 4-Very High
- JobLevel - Level of job (1 to 5)
- JobRole - Job Roles
- JobSatisfaction - 1-Low, 2-Medium, 3-High, 4-Very High
- MaritalStatus - Marital Status
- MonthlyIncome - Monthly salary
- MonthlyRate - Monthly rate
- NumCompaniesWorked - Number of companies worked at
- Over18 - Over 18 years of age?
- OverTime - Overtime?
- PercentSalaryHike - The percentage increase in salary last year
- PerformanceRating - 1-Low, 2-Good, 3-Excellent, 4-Outstanding
- RelationshipSatisfaction - 1-Low, 2-Medium, 3-High, 4-Very High
- StandardHours - Standard Hours
- StockOptionLevel - Stock Option Level
- TotalWorkingYears - Total years worked
- TrainingTimesLastYear - Number of training attended last year
- WorkLifeBalance - 1-Low, 2-Good, 3-Excellent, 4-Outstanding
- YearsAtCompany - Years at Company
- YearsInCurrentRole - Years in the current role
- YearsSinceLastPromotion - Years since the last promotion
- YearsWithCurrManager - Years with the current manager
Acknowledgements
IBM Watson Analytics Use Case for HR Retaining Valuable Employees
Import the module and dataset¶
In [1]:
import os
import time
import warnings
# ignore warning
warnings.filterwarnings("ignore")
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set pandas display options to maximize output visibility
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)
pd.set_option("display.width", 0)
pd.set_option("display.expand_frame_repr", False)
Data imported into dataframe 'employee_df'¶
In [2]:
# Import dataset
datasource = "dataset/employee_data.csv"
employee_df = pd.read_csv(datasource,encoding='windows-1252')
EDA¶
Data Info¶
In [3]:
print("Dataframe Info:\n")
employee_df.info()
Dataframe Info: <class 'pandas.core.frame.DataFrame'> RangeIndex: 1470 entries, 0 to 1469 Data columns (total 35 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 EmployeeId 1470 non-null int64 1 Age 1470 non-null int64 2 Attrition 1058 non-null float64 3 BusinessTravel 1470 non-null object 4 DailyRate 1470 non-null int64 5 Department 1470 non-null object 6 DistanceFromHome 1470 non-null int64 7 Education 1470 non-null int64 8 EducationField 1470 non-null object 9 EmployeeCount 1470 non-null int64 10 EnvironmentSatisfaction 1470 non-null int64 11 Gender 1470 non-null object 12 HourlyRate 1470 non-null int64 13 JobInvolvement 1470 non-null int64 14 JobLevel 1470 non-null int64 15 JobRole 1470 non-null object 16 JobSatisfaction 1470 non-null int64 17 MaritalStatus 1470 non-null object 18 MonthlyIncome 1470 non-null int64 19 MonthlyRate 1470 non-null int64 20 NumCompaniesWorked 1470 non-null int64 21 Over18 1470 non-null object 22 OverTime 1470 non-null object 23 PercentSalaryHike 1470 non-null int64 24 PerformanceRating 1470 non-null int64 25 RelationshipSatisfaction 1470 non-null int64 26 StandardHours 1470 non-null int64 27 StockOptionLevel 1470 non-null int64 28 TotalWorkingYears 1470 non-null int64 29 TrainingTimesLastYear 1470 non-null int64 30 WorkLifeBalance 1470 non-null int64 31 YearsAtCompany 1470 non-null int64 32 YearsInCurrentRole 1470 non-null int64 33 YearsSinceLastPromotion 1470 non-null int64 34 YearsWithCurrManager 1470 non-null int64 dtypes: float64(1), int64(26), object(8) memory usage: 402.1+ KB
In [4]:
employee_df.sample(10)
Out[4]:
| EmployeeId | Age | Attrition | BusinessTravel | DailyRate | Department | DistanceFromHome | Education | EducationField | EmployeeCount | EnvironmentSatisfaction | Gender | HourlyRate | JobInvolvement | JobLevel | JobRole | JobSatisfaction | MaritalStatus | MonthlyIncome | MonthlyRate | NumCompaniesWorked | Over18 | OverTime | PercentSalaryHike | PerformanceRating | RelationshipSatisfaction | StandardHours | StockOptionLevel | TotalWorkingYears | TrainingTimesLastYear | WorkLifeBalance | YearsAtCompany | YearsInCurrentRole | YearsSinceLastPromotion | YearsWithCurrManager | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 326 | 327 | 45 | 1.0 | Travel_Frequently | 306 | Sales | 26 | 4 | Life Sciences | 1 | 1 | Female | 100 | 3 | 2 | Sales Executive | 1 | Married | 4286 | 5630 | 2 | Y | No | 14 | 3 | 4 | 80 | 2 | 5 | 4 | 3 | 1 | 1 | 0 | 0 |
| 566 | 567 | 26 | 0.0 | Travel_Frequently | 921 | Research & Development | 1 | 1 | Medical | 1 | 1 | Female | 66 | 2 | 1 | Research Scientist | 3 | Divorced | 2007 | 25265 | 1 | Y | No | 13 | 3 | 3 | 80 | 2 | 5 | 5 | 3 | 5 | 3 | 1 | 3 |
| 1164 | 1165 | 35 | NaN | Travel_Rarely | 1224 | Sales | 7 | 4 | Life Sciences | 1 | 3 | Female | 55 | 3 | 2 | Sales Executive | 4 | Married | 5204 | 13586 | 1 | Y | Yes | 11 | 3 | 4 | 80 | 0 | 10 | 2 | 3 | 10 | 8 | 0 | 9 |
| 156 | 157 | 56 | NaN | Non-Travel | 667 | Research & Development | 1 | 4 | Life Sciences | 1 | 3 | Male | 57 | 3 | 2 | Healthcare Representative | 3 | Divorced | 6306 | 26236 | 1 | Y | No | 21 | 4 | 1 | 80 | 1 | 13 | 2 | 2 | 13 | 12 | 1 | 9 |
| 346 | 347 | 26 | 0.0 | Travel_Frequently | 496 | Research & Development | 11 | 2 | Medical | 1 | 1 | Male | 60 | 3 | 2 | Healthcare Representative | 1 | Married | 4741 | 22722 | 1 | Y | Yes | 13 | 3 | 3 | 80 | 1 | 5 | 3 | 3 | 5 | 3 | 3 | 3 |
| 1134 | 1135 | 44 | 1.0 | Travel_Rarely | 621 | Research & Development | 15 | 3 | Medical | 1 | 1 | Female | 73 | 3 | 3 | Healthcare Representative | 4 | Married | 7978 | 14075 | 1 | Y | No | 11 | 3 | 4 | 80 | 1 | 10 | 2 | 3 | 10 | 7 | 0 | 5 |
| 456 | 457 | 27 | 0.0 | Travel_Frequently | 994 | Sales | 8 | 3 | Life Sciences | 1 | 4 | Male | 37 | 3 | 3 | Sales Executive | 3 | Single | 8726 | 2975 | 1 | Y | No | 15 | 3 | 4 | 80 | 0 | 9 | 0 | 3 | 9 | 8 | 1 | 7 |
| 526 | 527 | 36 | 1.0 | Travel_Rarely | 660 | Research & Development | 15 | 3 | Other | 1 | 1 | Male | 81 | 3 | 2 | Laboratory Technician | 3 | Divorced | 4834 | 7858 | 7 | Y | No | 14 | 3 | 2 | 80 | 1 | 9 | 3 | 2 | 1 | 0 | 0 | 0 |
| 1293 | 1294 | 43 | NaN | Travel_Frequently | 1422 | Sales | 2 | 4 | Life Sciences | 1 | 1 | Male | 92 | 3 | 2 | Sales Executive | 4 | Married | 5675 | 19246 | 1 | Y | No | 20 | 4 | 3 | 80 | 1 | 7 | 5 | 3 | 7 | 7 | 7 | 7 |
| 1230 | 1231 | 54 | 0.0 | Travel_Rarely | 397 | Human Resources | 19 | 4 | Medical | 1 | 3 | Male | 88 | 3 | 3 | Human Resources | 2 | Married | 10725 | 6729 | 2 | Y | No | 15 | 3 | 3 | 80 | 1 | 16 | 1 | 4 | 9 | 7 | 7 | 1 |
In [5]:
# Displaying the number of unique values in each column
employee_df.nunique().to_frame()
Out[5]:
| 0 | |
|---|---|
| EmployeeId | 1470 |
| Age | 43 |
| Attrition | 2 |
| BusinessTravel | 3 |
| DailyRate | 886 |
| Department | 3 |
| DistanceFromHome | 29 |
| Education | 5 |
| EducationField | 6 |
| EmployeeCount | 1 |
| EnvironmentSatisfaction | 4 |
| Gender | 2 |
| HourlyRate | 71 |
| JobInvolvement | 4 |
| JobLevel | 5 |
| JobRole | 9 |
| JobSatisfaction | 4 |
| MaritalStatus | 3 |
| MonthlyIncome | 1349 |
| MonthlyRate | 1427 |
| NumCompaniesWorked | 10 |
| Over18 | 1 |
| OverTime | 2 |
| PercentSalaryHike | 15 |
| PerformanceRating | 2 |
| RelationshipSatisfaction | 4 |
| StandardHours | 1 |
| StockOptionLevel | 4 |
| TotalWorkingYears | 40 |
| TrainingTimesLastYear | 7 |
| WorkLifeBalance | 4 |
| YearsAtCompany | 37 |
| YearsInCurrentRole | 19 |
| YearsSinceLastPromotion | 16 |
| YearsWithCurrManager | 18 |
In [6]:
# Unique values from the 'Attrition' column
employee_df['Attrition'].unique()
Out[6]:
array([nan, 1., 0.])
Data format¶
In [7]:
# Listing type data kolom
employee_col_numeric_int = [
'Age', 'DailyRate', 'DistanceFromHome',
'EmployeeCount', 'HourlyRate',
'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
'PercentSalaryHike', 'StandardHours',
'TotalWorkingYears', 'TrainingTimesLastYear',
'YearsAtCompany', 'YearsInCurrentRole',
'YearsSinceLastPromotion', 'YearsWithCurrManager'
]
employee_col_numeric_int = pd.Index(employee_col_numeric_int)
employee_col_categorical = [
'BusinessTravel', 'Department', 'Education', 'EducationField',
'EnvironmentSatisfaction', 'Gender', 'JobInvolvement', 'JobLevel',
'JobRole', 'JobSatisfaction', 'MaritalStatus', 'Over18', 'OverTime',
'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel',
'WorkLifeBalance'
]
employee_col_categorical = pd.Index(employee_col_categorical)
employee_col_object = ['EmployeeId']
employee_col_object = pd.Index(employee_col_object)
In [8]:
# Convert to integer
employee_df[employee_col_numeric_int] = employee_df[employee_col_numeric_int].astype('int64')
# Conversion to category
employee_df[employee_col_categorical] = employee_df[employee_col_categorical].astype('category')
# Convert object columns (outside of other categories)
employee_df['EmployeeId'] = employee_df['EmployeeId'].astype('object')
employee_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1470 entries, 0 to 1469 Data columns (total 35 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 EmployeeId 1470 non-null object 1 Age 1470 non-null int64 2 Attrition 1058 non-null float64 3 BusinessTravel 1470 non-null category 4 DailyRate 1470 non-null int64 5 Department 1470 non-null category 6 DistanceFromHome 1470 non-null int64 7 Education 1470 non-null category 8 EducationField 1470 non-null category 9 EmployeeCount 1470 non-null int64 10 EnvironmentSatisfaction 1470 non-null category 11 Gender 1470 non-null category 12 HourlyRate 1470 non-null int64 13 JobInvolvement 1470 non-null category 14 JobLevel 1470 non-null category 15 JobRole 1470 non-null category 16 JobSatisfaction 1470 non-null category 17 MaritalStatus 1470 non-null category 18 MonthlyIncome 1470 non-null int64 19 MonthlyRate 1470 non-null int64 20 NumCompaniesWorked 1470 non-null int64 21 Over18 1470 non-null category 22 OverTime 1470 non-null category 23 PercentSalaryHike 1470 non-null int64 24 PerformanceRating 1470 non-null category 25 RelationshipSatisfaction 1470 non-null category 26 StandardHours 1470 non-null int64 27 StockOptionLevel 1470 non-null category 28 TotalWorkingYears 1470 non-null int64 29 TrainingTimesLastYear 1470 non-null int64 30 WorkLifeBalance 1470 non-null category 31 YearsAtCompany 1470 non-null int64 32 YearsInCurrentRole 1470 non-null int64 33 YearsSinceLastPromotion 1470 non-null int64 34 YearsWithCurrManager 1470 non-null int64 dtypes: category(17), float64(1), int64(16), object(1) memory usage: 234.3+ KB
Zero-Variance Feature¶
In [9]:
# Removing Zero-Variance Feature
try:
print("Column with only one unique value:")
const_cols = [col for col in employee_df.columns if employee_df[col].nunique() == 1]
print(const_cols)
employee_df = employee_df.drop(columns=const_cols)
print("Columns with only one unique value have been removed.")
except KeyError as e:
print(f"Column not found when dropping: {e}")
except Exception as e:
print(f"Terjadi error: {e}")
finally:
print("Zero-Variance column cleaning is complete.")
Column with only one unique value: ['EmployeeCount', 'Over18', 'StandardHours'] Columns with only one unique value have been removed. Zero-Variance column cleaning is complete.
Null, NaN¶
In [10]:
employee_df.isnull().sum().to_frame()
Out[10]:
| 0 | |
|---|---|
| EmployeeId | 0 |
| Age | 0 |
| Attrition | 412 |
| BusinessTravel | 0 |
| DailyRate | 0 |
| Department | 0 |
| DistanceFromHome | 0 |
| Education | 0 |
| EducationField | 0 |
| EnvironmentSatisfaction | 0 |
| Gender | 0 |
| HourlyRate | 0 |
| JobInvolvement | 0 |
| JobLevel | 0 |
| JobRole | 0 |
| JobSatisfaction | 0 |
| MaritalStatus | 0 |
| MonthlyIncome | 0 |
| MonthlyRate | 0 |
| NumCompaniesWorked | 0 |
| OverTime | 0 |
| PercentSalaryHike | 0 |
| PerformanceRating | 0 |
| RelationshipSatisfaction | 0 |
| StockOptionLevel | 0 |
| TotalWorkingYears | 0 |
| TrainingTimesLastYear | 0 |
| WorkLifeBalance | 0 |
| YearsAtCompany | 0 |
| YearsInCurrentRole | 0 |
| YearsSinceLastPromotion | 0 |
| YearsWithCurrManager | 0 |
Handling NaN in the Attrition column by dropping NaN values.
In [11]:
# Separate rows with NaN values in the 'Attrition' column.
employee_attrition_nan = employee_df[employee_df['Attrition'].isna()]
# Save to CSV file will be used for the prediction file.
employee_attrition_nan.to_csv('saved/employee_attrition_nan.csv', index=False)
# Drop rows with NaN values in the 'Attrition' column to create a clean dataset.
employee_df_clean = employee_df.dropna(subset=['Attrition'])
# Convert the 'Attrition' column to integer type
employee_df_clean['Attrition'] = employee_df_clean['Attrition'].astype(int)
# Show the number of null values after cleaning
print("\nNumber of null values after being cleaned:\n")
employee_df_clean.isnull().sum().to_frame()
Number of null values after being cleaned:
Out[11]:
| 0 | |
|---|---|
| EmployeeId | 0 |
| Age | 0 |
| Attrition | 0 |
| BusinessTravel | 0 |
| DailyRate | 0 |
| Department | 0 |
| DistanceFromHome | 0 |
| Education | 0 |
| EducationField | 0 |
| EnvironmentSatisfaction | 0 |
| Gender | 0 |
| HourlyRate | 0 |
| JobInvolvement | 0 |
| JobLevel | 0 |
| JobRole | 0 |
| JobSatisfaction | 0 |
| MaritalStatus | 0 |
| MonthlyIncome | 0 |
| MonthlyRate | 0 |
| NumCompaniesWorked | 0 |
| OverTime | 0 |
| PercentSalaryHike | 0 |
| PerformanceRating | 0 |
| RelationshipSatisfaction | 0 |
| StockOptionLevel | 0 |
| TotalWorkingYears | 0 |
| TrainingTimesLastYear | 0 |
| WorkLifeBalance | 0 |
| YearsAtCompany | 0 |
| YearsInCurrentRole | 0 |
| YearsSinceLastPromotion | 0 |
| YearsWithCurrManager | 0 |
Outlier¶
In [12]:
# Identify outliers using the Interquartile Range (IQR) method.
from scripts.runPlot_DetectOutliersIQR import plot_detect_outliers_iqr
print("\nOutliers detected using IQR method:\n")
ignore_outlier_cols = ['Attrition']
plot_detect_outliers_iqr(df=employee_df_clean, ignore_cols=ignore_outlier_cols)
Outliers detected using IQR method:
[INFO] No outliers detected in 'Age'.
[INFO] No outliers detected in 'DailyRate'.
[INFO] No outliers detected in 'DistanceFromHome'.
[INFO] No outliers detected in 'HourlyRate'.
[INFO] No outliers detected in 'MonthlyRate'.
[INFO] No outliers detected in 'PercentSalaryHike'.
[OUTLIER] 'MonthlyIncome' — 64 values detected
Q1 = 2900.25, Median = 4903.50, Q3 = 8736.50
Lower bound = -5854.12, Upper bound = 17490.88
Values:
{ 17584: 1x, 17639: 1x, 17650: 1x, 17665: 1x, 17779: 1x, 17856: 1x, 17861: 2x,
17924: 1x, 18041: 1x, 18061: 1x, 18172: 1x, 18200: 1x, 18213: 1x, 18265: 1x,
18300: 1x, 18303: 1x, 18430: 1x, 18606: 1x, 18665: 1x, 18711: 1x, 18722: 1x,
18740: 1x, 18789: 1x, 18824: 1x, 18844: 1x, 18947: 1x, 19033: 1x, 19045: 1x,
19068: 1x, 19081: 1x, 19094: 1x, 19141: 1x, 19144: 1x, 19187: 1x, 19189: 1x,
19190: 1x, 19197: 1x, 19202: 1x, 19232: 1x, 19237: 1x, 19246: 1x, 19272: 1x,
19406: 1x, 19419: 1x, 19436: 1x, 19502: 1x, 19513: 1x, 19517: 1x, 19537: 1x,
19545: 1x, 19566: 1x, 19626: 1x, 19627: 1x, 19701: 1x, 19717: 1x, 19740: 1x,
19845: 1x, 19847: 1x, 19859: 1x, 19926: 1x, 19943: 1x, 19973: 1x, 19999: 1x }
[OUTLIER] 'NumCompaniesWorked' — 43 values detected
Q1 = 1.00, Median = 2.00, Q3 = 4.00
Lower bound = -3.50, Upper bound = 8.50
Values:
{ 9: 43x }
[OUTLIER] 'TotalWorkingYears' — 29 values detected
Q1 = 6.00, Median = 10.00, Q3 = 16.00
Lower bound = -9.00, Upper bound = 31.00
Values:
{ 32: 7x, 33: 7x, 34: 3x, 35: 2x, 36: 4x, 37: 3x, 38: 1x, 40: 2x }
[OUTLIER] 'TrainingTimesLastYear' — 174 values detected
Q1 = 2.00, Median = 3.00, Q3 = 3.00
Lower bound = 0.50, Upper bound = 4.50
Values:
{ 0: 43x, 5: 87x, 6: 44x }
[OUTLIER] 'YearsAtCompany' — 79 values detected
Q1 = 3.00, Median = 5.00, Q3 = 9.00
Lower bound = -6.00, Upper bound = 18.00
Values:
{ 19: 9x, 20: 20x, 21: 11x, 22: 9x, 24: 6x, 25: 4x, 26: 3x, 27: 2x, 29: 2x, 31:
3x, 32: 2x, 33: 4x, 34: 1x, 36: 1x, 37: 1x, 40: 1x }
[OUTLIER] 'YearsInCurrentRole' — 15 values detected
Q1 = 2.00, Median = 3.00, Q3 = 7.00
Lower bound = -5.50, Upper bound = 14.50
Values:
{ 15: 5x, 16: 6x, 17: 2x, 18: 2x }
[OUTLIER] 'YearsSinceLastPromotion' — 80 values detected
Q1 = 0.00, Median = 1.00, Q3 = 3.00
Lower bound = -4.50, Upper bound = 7.50
Values:
{ 8: 11x, 9: 14x, 10: 5x, 11: 21x, 12: 6x, 13: 7x, 14: 3x, 15: 13x }
[OUTLIER] 'YearsWithCurrManager' — 12 values detected
Q1 = 2.00, Median = 3.00, Q3 = 7.00
Lower bound = -5.50, Upper bound = 14.50
Values:
{ 15: 5x, 16: 1x, 17: 6x }
Creating a new dataframe for data plotting needs¶
In [13]:
employee_plot = employee_df_clean.copy()
# list of numeric columns
numeric_cols = employee_df_clean.select_dtypes(include=['number']).columns.tolist()
numeric_cols = pd.Index(numeric_cols)
print(numeric_cols)
# list of category columns
categorical_cols = employee_df_clean.select_dtypes(include=['object', 'category', 'bool']).columns.tolist()
categorical_cols.remove('EmployeeId')
categorical_cols = pd.Index(categorical_cols)
print(categorical_cols)
label_map = {
0: 'Active',
1: 'Resigned'
}
education_map = {
"1": "1-Below College",
"2": "2-College",
"3": "3-Bachelor",
"4": "4-Master",
"5": "5-Doctor"
}
env_satisfaction_map = {
"1": "1-Low",
"2": "2-Medium",
"3": "3-High",
"4": "4-Very High"
}
job_involvement_map = {
"1": "1-Low",
"2": "2-Medium",
"3": "3-High",
"4": "4-Very High"
}
job_level_map = {
"1": "1-Entry Level",
"2": "2-Junior",
"3": "3-Mid Level",
"4": "4-Senior",
"5": "5-Executive/Top Level"
}
job_satisfaction_map = {
"1": "1-Low",
"2": "2-Medium",
"3": "3-High",
"4": "4-Very High"
}
performance_rating_map = {
"1": "1-Low",
"2": "2-Good",
"3": "3-Excellent",
"4": "4-Outstanding"
}
relationship_satisfaction_map = {
"1": "1-Low",
"2": "2-Medium",
"3": "3-High",
"4": "4-Very High"
}
stock_option_level_map = {
"0": "0-None",
"1": "1-Low",
"2": "2-Medium",
"3": "3-High"
}
work_life_balance_map = {
"1": "1-Low",
"2": "2-Good",
"3": "3-Excellent",
"4": "4-Outstanding"
}
employee_plot["Education"] = employee_plot["Education"].astype(str).map(education_map)
employee_plot["EnvironmentSatisfaction"] = employee_plot["EnvironmentSatisfaction"].astype(str).map(env_satisfaction_map)
employee_plot["JobLevel"] = employee_plot["JobLevel"].astype(str).map(job_level_map)
employee_plot["JobInvolvement"] = employee_plot["JobInvolvement"].astype(str).map(job_involvement_map)
employee_plot["JobSatisfaction"] = employee_plot["JobSatisfaction"].astype(str).map(job_satisfaction_map)
employee_plot["PerformanceRating"] = employee_plot["PerformanceRating"].astype(str).map(performance_rating_map)
employee_plot["RelationshipSatisfaction"] = employee_plot["RelationshipSatisfaction"].astype(str).map(relationship_satisfaction_map)
employee_plot["StockOptionLevel"] = employee_plot["StockOptionLevel"].astype(str).map(stock_option_level_map)
employee_plot["WorkLifeBalance"] = employee_plot["WorkLifeBalance"].astype(str).map(work_life_balance_map)
employee_plot.info()
Index(['Age', 'Attrition', 'DailyRate', 'DistanceFromHome', 'HourlyRate',
'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
'PercentSalaryHike', 'TotalWorkingYears', 'TrainingTimesLastYear',
'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',
'YearsWithCurrManager'],
dtype='object')
Index(['BusinessTravel', 'Department', 'Education', 'EducationField',
'EnvironmentSatisfaction', 'Gender', 'JobInvolvement', 'JobLevel',
'JobRole', 'JobSatisfaction', 'MaritalStatus', 'OverTime',
'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel',
'WorkLifeBalance'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
Index: 1058 entries, 1 to 1469
Data columns (total 32 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 EmployeeId 1058 non-null object
1 Age 1058 non-null int64
2 Attrition 1058 non-null int32
3 BusinessTravel 1058 non-null category
4 DailyRate 1058 non-null int64
5 Department 1058 non-null category
6 DistanceFromHome 1058 non-null int64
7 Education 1058 non-null object
8 EducationField 1058 non-null category
9 EnvironmentSatisfaction 1058 non-null object
10 Gender 1058 non-null category
11 HourlyRate 1058 non-null int64
12 JobInvolvement 1058 non-null object
13 JobLevel 1058 non-null object
14 JobRole 1058 non-null category
15 JobSatisfaction 1058 non-null object
16 MaritalStatus 1058 non-null category
17 MonthlyIncome 1058 non-null int64
18 MonthlyRate 1058 non-null int64
19 NumCompaniesWorked 1058 non-null int64
20 OverTime 1058 non-null category
21 PercentSalaryHike 1058 non-null int64
22 PerformanceRating 1058 non-null object
23 RelationshipSatisfaction 1058 non-null object
24 StockOptionLevel 1058 non-null object
25 TotalWorkingYears 1058 non-null int64
26 TrainingTimesLastYear 1058 non-null int64
27 WorkLifeBalance 1058 non-null object
28 YearsAtCompany 1058 non-null int64
29 YearsInCurrentRole 1058 non-null int64
30 YearsSinceLastPromotion 1058 non-null int64
31 YearsWithCurrManager 1058 non-null int64
dtypes: category(7), int32(1), int64(14), object(10)
memory usage: 251.5+ KB
Descriptive Statistics "employee_df"¶
In [14]:
# Descriptive statistics
print("\nDescriptive statistics of numeric category columns:")
display(employee_df[numeric_cols].describe())
# Descriptive statistics for the category column
print("\nDescriptive statistics for the category column:")
display(employee_df[categorical_cols].describe(include='all'))
Descriptive statistics of numeric category columns:
| Age | Attrition | DailyRate | DistanceFromHome | HourlyRate | MonthlyIncome | MonthlyRate | NumCompaniesWorked | PercentSalaryHike | TotalWorkingYears | TrainingTimesLastYear | YearsAtCompany | YearsInCurrentRole | YearsSinceLastPromotion | YearsWithCurrManager | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1470.000000 | 1058.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 | 1470.000000 |
| mean | 36.923810 | 0.169187 | 802.485714 | 9.192517 | 65.891156 | 6502.931293 | 14313.103401 | 2.693197 | 15.209524 | 11.279592 | 2.799320 | 7.008163 | 4.229252 | 2.187755 | 4.123129 |
| std | 9.135373 | 0.375094 | 403.509100 | 8.106864 | 20.329428 | 4707.956783 | 7117.786044 | 2.498009 | 3.659938 | 7.780782 | 1.289271 | 6.126525 | 3.623137 | 3.222430 | 3.568136 |
| min | 18.000000 | 0.000000 | 102.000000 | 1.000000 | 30.000000 | 1009.000000 | 2094.000000 | 0.000000 | 11.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 30.000000 | 0.000000 | 465.000000 | 2.000000 | 48.000000 | 2911.000000 | 8047.000000 | 1.000000 | 12.000000 | 6.000000 | 2.000000 | 3.000000 | 2.000000 | 0.000000 | 2.000000 |
| 50% | 36.000000 | 0.000000 | 802.000000 | 7.000000 | 66.000000 | 4919.000000 | 14235.500000 | 2.000000 | 14.000000 | 10.000000 | 3.000000 | 5.000000 | 3.000000 | 1.000000 | 3.000000 |
| 75% | 43.000000 | 0.000000 | 1157.000000 | 14.000000 | 83.750000 | 8379.000000 | 20461.500000 | 4.000000 | 18.000000 | 15.000000 | 3.000000 | 9.000000 | 7.000000 | 3.000000 | 7.000000 |
| max | 60.000000 | 1.000000 | 1499.000000 | 29.000000 | 100.000000 | 19999.000000 | 26999.000000 | 9.000000 | 25.000000 | 40.000000 | 6.000000 | 40.000000 | 18.000000 | 15.000000 | 17.000000 |
Descriptive statistics for the category column:
| BusinessTravel | Department | Education | EducationField | EnvironmentSatisfaction | Gender | JobInvolvement | JobLevel | JobRole | JobSatisfaction | MaritalStatus | OverTime | PerformanceRating | RelationshipSatisfaction | StockOptionLevel | WorkLifeBalance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 | 1470 |
| unique | 3 | 3 | 5 | 6 | 4 | 2 | 4 | 5 | 9 | 4 | 3 | 2 | 2 | 4 | 4 | 4 |
| top | Travel_Rarely | Research & Development | 3 | Life Sciences | 3 | Male | 3 | 1 | Sales Executive | 4 | Married | No | 3 | 3 | 0 | 3 |
| freq | 1043 | 961 | 572 | 606 | 453 | 882 | 868 | 543 | 326 | 459 | 673 | 1054 | 1244 | 459 | 631 | 893 |
Plot data¶
Hitogram Plot¶
In [15]:
from scripts.runPlot_ObsHistNumeric import plot_obs_histnums
plot_obs_histnums(df=employee_plot, numeric_cols=numeric_cols, n_cols=4, color='salmon')
In [16]:
from scripts.runPlot_ObsHistCatgs import plot_obs_histcatgs
plot_obs_histcatgs(df=employee_plot, categorical_cols=categorical_cols, n_cols=4)
Pair plot¶
In [17]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Creating a scatter matrix (pair plot)
g = sns.pairplot(employee_plot, vars=numeric_cols)
# Ganti ukuran figure secara manual
g.fig.set_size_inches(20, 20) # (lebar, tinggi)
plt.show()
Correlation Headmap¶
In [18]:
from scripts.runPlot_ObsCorrHeatmap import plot_obs_corrheatmap
plot_obs_corrheatmap(employee_plot, columns=numeric_cols,figsize=(18, 10))
Box Plot¶
In [19]:
from scripts.runPlot_ObsBoxCols import plot_obs_boxcols
plot_obs_boxcols(employee_plot, numeric_cols=numeric_cols, col_target='Attrition', label_map=label_map, n_cols=4)
Count Plot¶
In [20]:
from scripts.runPlot_ObsCountCols import plot_obs_countcols
plot_obs_countcols(df=employee_plot, categorical_cols=categorical_cols, col_target='Attrition', n_cols=4,legend_show=False,label_map=label_map)
Line plot¶
In [21]:
from scripts.runPlot_ObsLineXYcMetricSubplot import plot_obs_lineXYc_metric_subplot
from scripts.runPlot_ObsHitsXYcMetricSubplot import plot_obs_histXYc_metric_subplot
XcolsList = [col for col in categorical_cols if col != 'Attrition'] + [col for col in numeric_cols if col != 'Attrition']
XcolsList.sort()
print("columns List to plot:")
print(XcolsList)
columns List to plot: ['Age', 'BusinessTravel', 'DailyRate', 'Department', 'DistanceFromHome', 'Education', 'EducationField', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate', 'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction', 'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'OverTime', 'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion', 'YearsWithCurrManager']
In [22]:
YCol = "Attrition"
Xline_smooth = True
rotation_label = 90
metric_types = ["count","nimax","rank","catg_contrib"]
continue_cols=["Age", "HourlyRate","DailyRate", "MonthlyIncome","MonthlyRate"]
metric_types_map = {
"count": "Count Plot",
"nimax": "niMax Normalization Plot",
"rank": "Rangking Metric Plot",
"catg_contrib": "Percentage Contribution Metric Plot"
}
for metric_type in metric_types:
print(f"------------ Metric Plot: {metric_types_map[metric_type]} ------------")
plot_obs_lineXYc_metric_subplot(
employee_plot,
Xcols=XcolsList,
lineCol=YCol,
n_cols=4,
ignore_cols=None,
label_map=label_map,
line_smooth=Xline_smooth,
rotation_label=rotation_label,
height_per_plot=5,
legend_show=True,
metric_type=metric_type)
plot_obs_histXYc_metric_subplot(
employee_plot,
Xcols=continue_cols,
barCol=YCol,
metric_type=metric_type,
label_map=label_map,
n_cols=2,
width_per_plot=18,
height_per_plot=4,
rotation_label=rotation_label,
bins=30,
legend_show=False)
------------ Metric Plot: Count Plot ------------
------------ Metric Plot: niMax Normalization Plot ------------
------------ Metric Plot: Rangking Metric Plot ------------
------------ Metric Plot: Percentage Contribution Metric Plot ------------
Saving the dataframe to a dataset file¶
In [ ]:
employee_df_clean.to_parquet("saved/employee_cleaned.parquet", index=False)
employee_plot.to_csv("saved/employee_plot.csv", index=False)
Modeling¶
Encoding¶
In [ ]:
from sklearn.preprocessing import LabelEncoder
print(categorical_cols)
employee_df_encode = employee_df_clean.copy()
le = LabelEncoder()
for col in categorical_cols:
employee_df_encode[col] = le.fit_transform(employee_df_encode[col])
Index(['BusinessTravel', 'Department', 'Education', 'EducationField',
'EnvironmentSatisfaction', 'Gender', 'JobInvolvement', 'JobLevel',
'JobRole', 'JobSatisfaction', 'MaritalStatus', 'OverTime',
'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel',
'WorkLifeBalance'],
dtype='object')
Split feature data and target data¶
In [ ]:
X = employee_df_encode.drop(columns=['EmployeeId','Attrition'])
y = employee_df_encode['Attrition']
Building a Model¶
import def function to support the process.¶
In [ ]:
from scripts.runModelSuprt import *
resultRunModel_df = newDf_logModel()
models_path = "__models__/"
Running the Model¶
Logistic Regression¶
In [ ]:
# import script runModel
from scripts.runModel_LogRegression import *
# running several model trainings
#------------------
setModelRun = {
"IdRun": "LR01",
"options": {"scheme":20},
"Params": {"modelParams":{"random_state": 24}}
}
# Simulation model, runModel_LogRegression
setModel_Output, model_output = runModel_LogRegression(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Logistic Regression")
#------------------
setModelRun = {
"IdRun": "LR02",
"options": {"scheme":30},
"Params": {"modelParams":{"random_state": 24}}
}
# Simulation model, runModel_LogRegression
setModel_Output, model_output = runModel_LogRegression(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Logistic Regression")
#------------------
setModelRun = {
"IdRun": "LR03",
"options": {"scheme":30,"gridSCV": True},
"Params": {"modelParams":{"random_state": 24}}
}
# Simulation model, runModel_LogRegression
setModel_Output, model_output = runModel_LogRegression(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Logistic Regression")
#------------------
setModelRun = {
"IdRun": "LR04",
"options": {"scheme":30,"randSCV": True},
"Params": {"modelParams":{"random_state": 24}}
}
# Simulation model, runModel_LogRegression
setModel_Output, model_output = runModel_LogRegression(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Logistic Regression")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Logistic Regression | LR01 | 2025-05-08 03:01:02 | {'scheme': 20} | {'modelParams': {'random_state': 24}} | NaN | 0.836879 | 0.820755 | 0.744482 | 0.853036 | 0.820755 |
| 1 | Logistic Regression | LR02 | 2025-05-08 03:01:02 | {'scheme': 30} | {'modelParams': {'random_state': 24}} | NaN | 0.833784 | 0.836478 | 0.770531 | 0.863472 | 0.836478 |
| 2 | Logistic Regression | LR03 | 2025-05-08 03:01:07 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 24, 'n_jobs': 2}, 'BestParamsGridSCV': {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}} | NaN | 0.862162 | 0.880503 | 0.852995 | 0.895589 | 0.880503 |
| 3 | Logistic Regression | LR04 | 2025-05-08 03:01:10 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'max_iter': 500, 'random_state': 24}, 'BestParamsRandSCV': {'C': 0.13292918943162169, 'penalty': 'l1', 'solver': 'liblinear'}} | NaN | 0.855405 | 0.874214 | 0.845258 | 0.882668 | 0.874214 |
Random Forest¶
In [ ]:
# import script runModel
from scripts.runModel_RandomForest import *
# Running multiple models
#------------------
setModelRun = {
"IdRun": "RF01",
"options": {"scheme":20},
"Params": {"modelParams":{"random_state": 24}}
}
# Model simulation, runModel_RandomForest
setModel_Output, model_output = runModel_RandomForest(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Random Forest")
#------------------
setModelRun = {
"IdRun": "RF02",
"options": {"scheme":30},
"Params": {"modelParams":{"random_state": 24}}
}
# Model simulation, runModel_RandomForest
setModel_Output, model_output = runModel_RandomForest(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Random Forest")
#------------------
setModelRun = {
"IdRun": "RF03",
"options": {"scheme":30,"gridSCV": True},
}
# Model simulation, runModel_RandomForest
setModel_Output, model_output = runModel_RandomForest(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Random Forest")
#------------------
setModelRun = {
"IdRun": "RF04",
"options": {"scheme":30,"randSCV": True},
}
# Model simulation, runModel_RandomForest
setModel_Output, model_output = runModel_RandomForest(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Random Forest")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | Random Forest | RF01 | 2025-05-08 03:01:11 | {'scheme': 20} | {'modelParams': {'n_jobs': 2, 'random_state': 24}} | NaN | 1.000000 | 0.849057 | 0.802944 | 0.872618 | 0.849057 |
| 5 | Random Forest | RF02 | 2025-05-08 03:01:12 | {'scheme': 30} | {'modelParams': {'n_jobs': 2, 'random_state': 24}} | NaN | 1.000000 | 0.861635 | 0.820854 | 0.881466 | 0.861635 |
| 6 | Random Forest | RF03 | 2025-05-08 03:06:12 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42, 'n_jobs': 2}, 'BestParamsGridSCV': {'max_depth': 20, 'max_features': 0.5, 'min_samples_leaf': 1, 'min_samples_split': 15, 'n_estimators': 500}} | NaN | 0.935135 | 0.855346 | 0.812711 | 0.863049 | 0.855346 |
| 7 | Random Forest | RF04 | 2025-05-08 03:06:18 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42, 'n_jobs': 2}, 'BestParamsRandSCV': {'n_estimators': 50, 'min_samples_split': 2, 'max_depth': 10}} | NaN | 0.993243 | 0.849057 | 0.800881 | 0.854986 | 0.849057 |
Decision Tree¶
In [ ]:
# import script runModel
from scripts.runModel_DecisionTree import *
# Running several training models
#------------------
setModelRun = {
"IdRun": "DF01",
"options": {"scheme":20},
}
# Simulation model, runModel_DecisionTree
setModel_Output, model_output = runModel_DecisionTree(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "DecisionTree Classifier")
#------------------
setModelRun = {
"IdRun": "DF02",
"options": {"scheme":30},
}
# Simulation model, runModel_DecisionTree
setModel_Output, model_output = runModel_DecisionTree(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "DecisionTree Classifier")
#------------------
setModelRun = {
"IdRun": "DF03",
"options": {"scheme":30,"gridSCV": True},
}
# Simulation model, runModel_DecisionTree
setModel_Output, model_output = runModel_DecisionTree(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "DecisionTree Classifier")
#------------------
setModelRun = {
"IdRun": "DF04",
"options": {"scheme":30,"randSCV": True},
}
# Simulation model, runModel_DecisionTree
setModel_Output, model_output = runModel_DecisionTree(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "DecisionTree Classifier")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | DecisionTree Classifier | DF01 | 2025-05-08 03:06:18 | {'scheme': 20} | {'modelParams': {'random_state': 42}} | NaN | 1.000000 | 0.764151 | 0.750122 | 0.739353 | 0.764151 |
| 1 | DecisionTree Classifier | DF02 | 2025-05-08 03:06:18 | {'scheme': 30} | {'modelParams': {'random_state': 42}} | NaN | 1.000000 | 0.783019 | 0.783790 | 0.784576 | 0.783019 |
| 2 | DecisionTree Classifier | DF03 | 2025-05-08 03:06:23 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 2}} | NaN | 0.964865 | 0.811321 | 0.793812 | 0.784271 | 0.811321 |
| 3 | DecisionTree Classifier | DF04 | 2025-05-08 03:06:23 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 8, 'min_samples_split': 13}} | NaN | 0.836486 | 0.858491 | 0.838929 | 0.841739 | 0.858491 |
AdaBoost (Adaptive Boosting)¶
In [ ]:
from scripts.runModel_Adaboost import *
# Running several training models
#------------------
setModelRun = {
"IdRun": "AD01",
"options": {"scheme":20},
}
# Model simulation, runModel_Adaboost
setModel_Output, model_output = runModel_Adaboost(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Adaboost Classifier")
#------------------
setModelRun = {
"IdRun": "AD02",
"options": {"scheme":30},
}
# Model simulation, runModel_Adaboost
setModel_Output, model_output = runModel_Adaboost(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Adaboost Classifier")
#------------------
setModelRun = {
"IdRun": "AD03",
"options": {"scheme":30,"gridSCV": True},
}
# Model simulation, runModel_Adaboost
setModel_Output, model_output = runModel_Adaboost(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Adaboost Classifier")
#------------------
setModelRun = {
"IdRun": "AD04",
"options": {"scheme":30,"randSCV": True},
}
# Model simulation, runModel_Adaboost
setModel_Output, model_output = runModel_Adaboost(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Adaboost Classifier")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Adaboost Classifier | AD01 | 2025-05-08 03:06:24 | {'scheme': 20} | {'modelParams': {'random_state': 42}} | NaN | 0.875887 | 0.853774 | 0.834035 | 0.838556 | 0.853774 |
| 1 | Adaboost Classifier | AD02 | 2025-05-08 03:06:24 | {'scheme': 30} | {'modelParams': {'random_state': 42}} | NaN | 0.870270 | 0.867925 | 0.840028 | 0.864816 | 0.867925 |
| 2 | Adaboost Classifier | AD03 | 2025-05-08 03:06:39 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'learning_rate': 1.0, 'n_estimators': 50}} | NaN | 0.870270 | 0.867925 | 0.840028 | 0.864816 | 0.867925 |
| 3 | Adaboost Classifier | AD04 | 2025-05-08 03:06:50 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'learning_rate': 0.7319987722668247, 'n_estimators': 79}} | NaN | 0.871622 | 0.874214 | 0.845258 | 0.882668 | 0.874214 |
Gradient Boosting¶
In [ ]:
from scripts.runModel_GradientBoosting import *
# Running several training models
#------------------
setModelRun = {
"IdRun": "GB01",
"options": {"scheme":20},
}
# Model simulation, runModel_GradientBoosting
setModel_Output, model_output = runModel_GradientBoosting(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Gradient Boosting")
#------------------
setModelRun = {
"IdRun": "GB02",
"options": {"scheme":30},
}
# Model simulation, runModel_GradientBoosting
setModel_Output, model_output = runModel_GradientBoosting(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Gradient Boosting")
#------------------
setModelRun = {
"IdRun": "GB03",
"options": {"scheme":30,"gridSCV": True},
}
# Model simulation, runModel_GradientBoosting
setModel_Output, model_output = runModel_GradientBoosting(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Gradient Boosting")
#------------------
setModelRun = {
"IdRun": "GB04",
"options": {"scheme":30,"randSCV": True},
}
# Model simulation, runModel_GradientBoosting
setModel_Output, model_output = runModel_GradientBoosting(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Gradient Boosting")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 8 | Gradient Boosting | GB01 | 2025-05-08 03:06:51 | {'scheme': 20} | {'modelParams': {'random_state': 42}} | NaN | 0.971631 | 0.849057 | 0.816596 | 0.838627 | 0.849057 |
| 9 | Gradient Boosting | GB02 | 2025-05-08 03:06:51 | {'scheme': 30} | {'modelParams': {'random_state': 42}} | NaN | 0.977027 | 0.852201 | 0.824893 | 0.833206 | 0.852201 |
| 10 | Gradient Boosting | GB03 | 2025-05-08 03:08:31 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'learning_rate': 0.5, 'max_depth': 5, 'n_estimators': 50}} | NaN | 1.000000 | 0.858491 | 0.840891 | 0.841839 | 0.858491 |
| 11 | Gradient Boosting | GB04 | 2025-05-08 03:08:46 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'learning_rate': 0.19727005942368125, 'max_depth': 3, 'n_estimators': 64}} | NaN | 0.985135 | 0.871069 | 0.851357 | 0.860955 | 0.871069 |
Support Vector Machines (SVM)¶
In [ ]:
from scripts.runModel_SVM import *
# Running several training models
#------------------
setModelRun = {
"IdRun": "SV01",
"options": {"scheme":20},
}
# Model simulation, runModel_SVM
setModel_Output, model_output = runModel_SVM(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Support Vector Machine")
#------------------
setModelRun = {
"IdRun": "SV02",
"options": {"scheme":30},
}
# Model simulation, runModel_SVM
setModel_Output, model_output = runModel_SVM(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Support Vector Machine")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | Support Vector Machine | SV01 | 2025-05-08 03:08:47 | {'scheme': 20} | {'modelParams': {'probability': True}} | NaN | 0.834515 | 0.816038 | 0.733374 | 0.665918 | 0.816038 |
| 21 | Support Vector Machine | SV02 | 2025-05-08 03:08:48 | {'scheme': 30} | {'modelParams': {'probability': True}} | NaN | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 |
K-Nearest Neighbors (KNN)¶
In [ ]:
from scripts.runModel_KNN import *
# Running several training models
#------------------
setModelRun = {
"IdRun": "KN01",
"options": {"scheme":20},
}
# Model simulation, runModel_KNN
setModel_Output, model_output = runModel_KNN(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "K-Nearest Neighbors")
#------------------
setModelRun = {
"IdRun": "KN02",
"options": {"scheme":30},
}
# Model simulation, runModel_KNN
setModel_Output, model_output = runModel_KNN(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "K-Nearest Neighbors")
#------------------
setModelRun = {
"IdRun": "KN03",
"options": {"scheme":30,"gridSCV": True},
}
# Model simulation, runModel_KNN
setModel_Output, model_output = runModel_KNN(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "K-Nearest Neighbors")
#------------------
setModelRun = {
"IdRun": "KN04",
"options": {"scheme":30,"randSCV": True},
}
# Model simulation, runModel_KNN
setModel_Output, model_output = runModel_KNN(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "K-Nearest Neighbors")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | K-Nearest Neighbors | KN01 | 2025-05-08 03:08:48 | {'scheme': 20} | {'modelParams': {}} | NaN | 0.855792 | 0.816038 | 0.774000 | 0.773373 | 0.816038 |
| 13 | K-Nearest Neighbors | KN02 | 2025-05-08 03:08:49 | {'scheme': 30} | {'modelParams': {}} | NaN | 0.848649 | 0.814465 | 0.766051 | 0.752787 | 0.814465 |
| 14 | K-Nearest Neighbors | KN03 | 2025-05-08 03:08:50 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {}, 'BestParamsGridSCV': {'metric': 'manhattan', 'n_neighbors': 9, 'weights': 'uniform'}} | NaN | 0.841892 | 0.823899 | 0.747192 | 0.683551 | 0.823899 |
| 15 | K-Nearest Neighbors | KN04 | 2025-05-08 03:08:51 | {'scheme': 30, 'randSCV': True} | {'modelParams': {}, 'BestParamsRandSCV': {'metric': 'manhattan', 'n_neighbors': 14, 'weights': 'uniform'}} | NaN | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 |
XGBoost (Extreme Gradient Boosting)¶
In [ ]:
from scripts.runModel_XGBoost import *
# Running several training models
#------------------
setModelRun = {
"IdRun": "XG01",
"options": {"scheme":20},
}
# Model simulation, runModel_XGBoost
setModel_Output, model_output = runModel_XGBoost(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "XGBoost")
#------------------
setModelRun = {
"IdRun": "XG02",
"options": {"scheme":30},
}
# Model simulation, runModel_XGBoost
setModel_Output, model_output = runModel_XGBoost(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "XGBoost")
#------------------
setModelRun = {
"IdRun": "XG03",
"options": {"scheme":30,"gridSCV": True},
}
# Model simulation, runModel_XGBoost
setModel_Output, model_output = runModel_XGBoost(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "XGBoost")
#------------------
setModelRun = {
"IdRun": "XG04",
"options": {"scheme":30,"randSCV": True},
}
# Model simulation, runModel_XGBoost
setModel_Output, model_output = runModel_XGBoost(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "XGBoost")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 26 | XGBoost | XG01 | 2025-05-08 03:08:52 | {'scheme': 20} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}} | NaN | 1.000000 | 0.839623 | 0.816353 | 0.818064 | 0.839623 |
| 27 | XGBoost | XG02 | 2025-05-08 03:08:52 | {'scheme': 30} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}} | NaN | 1.000000 | 0.855346 | 0.836366 | 0.837337 | 0.855346 |
| 28 | XGBoost | XG03 | 2025-05-08 03:09:52 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}, 'BestParamsGridSCV': {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 50, 'subsample': 0.8}} | NaN | 0.914865 | 0.874214 | 0.849896 | 0.872260 | 0.874214 |
| 29 | XGBoost | XG04 | 2025-05-08 03:10:01 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}, 'BestParamsRandSCV': {'learning_rate': 0.29156581270472504, 'max_depth': 4, 'n_estimators': 70, 'subsample': 0.8852444528883149}} | NaN | 1.000000 | 0.874214 | 0.855917 | 0.864935 | 0.874214 |
Multi-Layer Perceptron (MLP) – Neural Network¶
In [ ]:
from scripts.runModel_MLP import *
# Running several training models
#------------------
setModelRun = {
"IdRun": "MP01",
"options": {"scheme":20},
}
# Model simulation, runModel_MLP
setModel_Output, model_output = runModel_MLP(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Multi-Layer Perceptron")
#------------------
setModelRun = {
"IdRun": "MP02",
"options": {"scheme":30},
}
# Model simulation, runModel_MLP
setModel_Output, model_output = runModel_MLP(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Multi-Layer Perceptron")
#------------------
setModelRun = {
"IdRun": "MP03",
"options": {"scheme":30,"gridSCV": True},
}
# Model simulation, runModel_MLP
setModel_Output, model_output = runModel_MLP(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Multi-Layer Perceptron")
#------------------
setModelRun = {
"IdRun": "MP04",
"options": {"scheme":30,"randSCV": True},
}
# Model simulation, runModel_MLP
setModel_Output, model_output = runModel_MLP(setModelRun, X, y)
# Set the Log dataframe and save the model
resultRunModel_df = updateRunModelDf(df=resultRunModel_df, setx=setModel_Output, model=model_output, models_path=models_path)
showResultDfModel(resultRunModel_df, ModelName = "Multi-Layer Perceptron")
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | Multi-Layer Perceptron | MP01 | 2025-05-08 03:10:02 | {'scheme': 20} | {'modelParams': {'max_iter': 300, 'random_state': 42}} | NaN | 0.822695 | 0.811321 | 0.753680 | 0.751575 | 0.811321 |
| 21 | Multi-Layer Perceptron | MP02 | 2025-05-08 03:10:02 | {'scheme': 30} | {'modelParams': {'max_iter': 300, 'random_state': 42}} | NaN | 0.831081 | 0.817610 | 0.744053 | 0.682640 | 0.817610 |
| 22 | Multi-Layer Perceptron | MP03 | 2025-05-08 03:10:59 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'max_iter': 300, 'random_state': 42}, 'BestParamsGridSCV': {'activation': 'tanh', 'alpha': 0.0001, 'hidden_layer_sizes': (100,), 'solver': 'adam'}} | NaN | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 |
| 23 | Multi-Layer Perceptron | MP04 | 2025-05-08 03:11:16 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'max_iter': 300, 'random_state': 42}, 'BestParamsRandSCV': {'activation': 'tanh', 'alpha': 0.006086584841970367, 'hidden_layer_sizes': (30, 30, 30), 'solver': 'sgd'}} | NaN | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 |
Evaluation¶
In [ ]:
resultRunModel_df
Out[ ]:
| ModelName | IdRun | DateAction | options | Params | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | ConfusionMatrix | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Adaboost Classifier | AD01 | 2025-05-08 03:06:24 | {'scheme': 20} | {'modelParams': {'random_state': 42}} | NaN | 0.875887 | 0.853774 | 0.834035 | 0.838556 | 0.853774 | [[167, 6], [25, 14]] |
| 1 | Adaboost Classifier | AD02 | 2025-05-08 03:06:24 | {'scheme': 30} | {'modelParams': {'random_state': 42}} | NaN | 0.870270 | 0.867925 | 0.840028 | 0.864816 | 0.867925 | [[260, 3], [39, 16]] |
| 2 | Adaboost Classifier | AD03 | 2025-05-08 03:06:39 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'learning_rate': 1.0, 'n_estimators': 50}} | NaN | 0.870270 | 0.867925 | 0.840028 | 0.864816 | 0.867925 | [[260, 3], [39, 16]] |
| 3 | Adaboost Classifier | AD04 | 2025-05-08 03:06:50 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'learning_rate': 0.7319987722668247, 'n_estimators': 79}} | NaN | 0.871622 | 0.874214 | 0.845258 | 0.882668 | 0.874214 | [[262, 1], [39, 16]] |
| 4 | DecisionTree Classifier | DF01 | 2025-05-08 03:06:18 | {'scheme': 20} | {'modelParams': {'random_state': 42}} | NaN | 1.000000 | 0.764151 | 0.750122 | 0.739353 | 0.764151 | [[153, 20], [30, 9]] |
| 5 | DecisionTree Classifier | DF02 | 2025-05-08 03:06:18 | {'scheme': 30} | {'modelParams': {'random_state': 42}} | NaN | 1.000000 | 0.783019 | 0.783790 | 0.784576 | 0.783019 | [[228, 35], [34, 21]] |
| 6 | DecisionTree Classifier | DF03 | 2025-05-08 03:06:23 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 2}} | NaN | 0.964865 | 0.811321 | 0.793812 | 0.784271 | 0.811321 | [[243, 20], [40, 15]] |
| 7 | DecisionTree Classifier | DF04 | 2025-05-08 03:06:23 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 8, 'min_samples_split': 13}} | NaN | 0.836486 | 0.858491 | 0.838929 | 0.841739 | 0.858491 | [[254, 9], [36, 19]] |
| 8 | Gradient Boosting | GB01 | 2025-05-08 03:06:51 | {'scheme': 20} | {'modelParams': {'random_state': 42}} | NaN | 0.971631 | 0.849057 | 0.816596 | 0.838627 | 0.849057 | [[170, 3], [29, 10]] |
| 9 | Gradient Boosting | GB02 | 2025-05-08 03:06:51 | {'scheme': 30} | {'modelParams': {'random_state': 42}} | NaN | 0.977027 | 0.852201 | 0.824893 | 0.833206 | 0.852201 | [[256, 7], [40, 15]] |
| 10 | Gradient Boosting | GB03 | 2025-05-08 03:08:31 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'learning_rate': 0.5, 'max_depth': 5, 'n_estimators': 50}} | NaN | 1.000000 | 0.858491 | 0.840891 | 0.841839 | 0.858491 | [[253, 10], [35, 20]] |
| 11 | Gradient Boosting | GB04 | 2025-05-08 03:08:46 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'learning_rate': 0.19727005942368125, 'max_depth': 3, 'n_estimators': 64}} | NaN | 0.985135 | 0.871069 | 0.851357 | 0.860955 | 0.871069 | [[257, 6], [35, 20]] |
| 12 | K-Nearest Neighbors | KN01 | 2025-05-08 03:08:48 | {'scheme': 20} | {'modelParams': {}} | NaN | 0.855792 | 0.816038 | 0.774000 | 0.773373 | 0.816038 | [[167, 6], [33, 6]] |
| 13 | K-Nearest Neighbors | KN02 | 2025-05-08 03:08:49 | {'scheme': 30} | {'modelParams': {}} | NaN | 0.848649 | 0.814465 | 0.766051 | 0.752787 | 0.814465 | [[254, 9], [50, 5]] |
| 14 | K-Nearest Neighbors | KN03 | 2025-05-08 03:08:50 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {}, 'BestParamsGridSCV': {'metric': 'manhattan', 'n_neighbors': 9, 'weights': 'uniform'}} | NaN | 0.841892 | 0.823899 | 0.747192 | 0.683551 | 0.823899 | [[262, 1], [55, 0]] |
| 15 | K-Nearest Neighbors | KN04 | 2025-05-08 03:08:51 | {'scheme': 30, 'randSCV': True} | {'modelParams': {}, 'BestParamsRandSCV': {'metric': 'manhattan', 'n_neighbors': 14, 'weights': 'uniform'}} | NaN | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 | [[263, 0], [55, 0]] |
| 16 | Logistic Regression | LR01 | 2025-05-08 03:01:02 | {'scheme': 20} | {'modelParams': {'random_state': 24}} | NaN | 0.836879 | 0.820755 | 0.744482 | 0.853036 | 0.820755 | [[173, 0], [38, 1]] |
| 17 | Logistic Regression | LR02 | 2025-05-08 03:01:02 | {'scheme': 30} | {'modelParams': {'random_state': 24}} | NaN | 0.833784 | 0.836478 | 0.770531 | 0.863472 | 0.836478 | [[263, 0], [52, 3]] |
| 18 | Logistic Regression | LR03 | 2025-05-08 03:01:07 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 24, 'n_jobs': 2}, 'BestParamsGridSCV': {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}} | NaN | 0.862162 | 0.880503 | 0.852995 | 0.895589 | 0.880503 | [[263, 0], [38, 17]] |
| 19 | Logistic Regression | LR04 | 2025-05-08 03:01:10 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'max_iter': 500, 'random_state': 24}, 'BestParamsRandSCV': {'C': 0.13292918943162169, 'penalty': 'l1', 'solver': 'liblinear'}} | NaN | 0.855405 | 0.874214 | 0.845258 | 0.882668 | 0.874214 | [[262, 1], [39, 16]] |
| 20 | Multi-Layer Perceptron | MP01 | 2025-05-08 03:10:02 | {'scheme': 20} | {'modelParams': {'max_iter': 300, 'random_state': 42}} | NaN | 0.822695 | 0.811321 | 0.753680 | 0.751575 | 0.811321 | [[169, 4], [36, 3]] |
| 21 | Multi-Layer Perceptron | MP02 | 2025-05-08 03:10:02 | {'scheme': 30} | {'modelParams': {'max_iter': 300, 'random_state': 42}} | NaN | 0.831081 | 0.817610 | 0.744053 | 0.682640 | 0.817610 | [[260, 3], [55, 0]] |
| 22 | Multi-Layer Perceptron | MP03 | 2025-05-08 03:10:59 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'max_iter': 300, 'random_state': 42}, 'BestParamsGridSCV': {'activation': 'tanh', 'alpha': 0.0001, 'hidden_layer_sizes': (100,), 'solver': 'adam'}} | NaN | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 | [[263, 0], [55, 0]] |
| 23 | Multi-Layer Perceptron | MP04 | 2025-05-08 03:11:16 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'max_iter': 300, 'random_state': 42}, 'BestParamsRandSCV': {'activation': 'tanh', 'alpha': 0.006086584841970367, 'hidden_layer_sizes': (30, 30, 30), 'solver': 'sgd'}} | NaN | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 | [[263, 0], [55, 0]] |
| 24 | Random Forest | RF01 | 2025-05-08 03:01:11 | {'scheme': 20} | {'modelParams': {'n_jobs': 2, 'random_state': 24}} | NaN | 1.000000 | 0.849057 | 0.802944 | 0.872618 | 0.849057 | [[173, 0], [32, 7]] |
| 25 | Random Forest | RF02 | 2025-05-08 03:01:12 | {'scheme': 30} | {'modelParams': {'n_jobs': 2, 'random_state': 24}} | NaN | 1.000000 | 0.861635 | 0.820854 | 0.881466 | 0.861635 | [[263, 0], [44, 11]] |
| 26 | Random Forest | RF03 | 2025-05-08 03:06:12 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42, 'n_jobs': 2}, 'BestParamsGridSCV': {'max_depth': 20, 'max_features': 0.5, 'min_samples_leaf': 1, 'min_samples_split': 15, 'n_estimators': 500}} | NaN | 0.935135 | 0.855346 | 0.812711 | 0.863049 | 0.855346 | [[262, 1], [45, 10]] |
| 27 | Random Forest | RF04 | 2025-05-08 03:06:18 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42, 'n_jobs': 2}, 'BestParamsRandSCV': {'n_estimators': 50, 'min_samples_split': 2, 'max_depth': 10}} | NaN | 0.993243 | 0.849057 | 0.800881 | 0.854986 | 0.849057 | [[262, 1], [47, 8]] |
| 28 | Support Vector Machine | SV01 | 2025-05-08 03:08:47 | {'scheme': 20} | {'modelParams': {'probability': True}} | NaN | 0.834515 | 0.816038 | 0.733374 | 0.665918 | 0.816038 | [[173, 0], [39, 0]] |
| 29 | Support Vector Machine | SV02 | 2025-05-08 03:08:48 | {'scheme': 30} | {'modelParams': {'probability': True}} | NaN | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 | [[263, 0], [55, 0]] |
| 30 | XGBoost | XG01 | 2025-05-08 03:08:52 | {'scheme': 20} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}} | NaN | 1.000000 | 0.839623 | 0.816353 | 0.818064 | 0.839623 | [[166, 7], [27, 12]] |
| 31 | XGBoost | XG02 | 2025-05-08 03:08:52 | {'scheme': 30} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}} | NaN | 1.000000 | 0.855346 | 0.836366 | 0.837337 | 0.855346 | [[253, 10], [36, 19]] |
| 32 | XGBoost | XG03 | 2025-05-08 03:09:52 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}, 'BestParamsGridSCV': {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 50, 'subsample': 0.8}} | NaN | 0.914865 | 0.874214 | 0.849896 | 0.872260 | 0.874214 | [[260, 3], [37, 18]] |
| 33 | XGBoost | XG04 | 2025-05-08 03:10:01 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}, 'BestParamsRandSCV': {'learning_rate': 0.29156581270472504, 'max_depth': 4, 'n_estimators': 70, 'subsample': 0.8852444528883149}} | NaN | 1.000000 | 0.874214 | 0.855917 | 0.864935 | 0.874214 | [[257, 6], [34, 21]] |
Rank Composite¶
Metric Ranking Based on General Perception in Machine Learning¶
In many machine learning scenarios, the importance of evaluation metrics (in general perception) is often ranked as follows:
- F1ScoreTest → The most balanced metric between Precision and Recall, especially useful for imbalanced datasets.
- RecallTest → Important when reducing false negatives is a priority (e.g., in healthcare, fraud detection).
- PrecisionTest → Important when reducing false positives is critical (e.g., in spam detection).
- AccuracyTest → Useful if the dataset is balanced, but can be misleading with imbalanced data.
- AccuracyTrain → Useful to check for overfitting, but not very meaningful as a standalone metric for final model evaluation.
Summary Ranking Based on General Perception:¶
| Rank | Metric | Reason |
|---|---|---|
| 1 | F1ScoreTest | Balances precision and recall, most representative metric overall. |
| 2 | RecallTest | Important for maximizing true positives. |
| 3 | PrecisionTest | Important for high accuracy in positive predictions. |
| 4 | AccuracyTest | Commonly used, but biased with imbalanced datasets. |
| 5 | AccuracyTrain | Only useful for detecting overfitting, not for final model assessment. |
Metric Weights (0-1):¶
| Metric | Weight (0-1) | Explanation |
|---|---|---|
| F1ScoreTest | 0.40 | Main focus, balances both precision and recall. |
| RecallTest | 0.20 | Important to reduce false negatives. |
| PrecisionTest | 0.15 | Important to reduce false positives. |
| AccuracyTest | 0.15 | Still considered, but not the primary metric. |
| AccuracyTrain | 0.10 | Useful for overfitting detection, minor contribution. |
In [ ]:
from scripts.getCalcRank_RModels import get_calc_rank_rmodels
df_model_rankscore = pd.read_csv(f"{models_path}resultRunModel_df.csv", index_col=0)
df_model_rankscore = get_calc_rank_rmodels(df_model_rankscore)
df_model_rankscore
Out[ ]:
| RankComposite | IdRun | ModelName | Remarks | AccuracyTrain | AccuracyTest | F1ScoreTest | PrecisionTest | RecallTest | options | Params | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1.0 | GB04 | Gradient Boosting | 0.985135 | 0.871069 | 0.851357 | 0.860955 | 0.871069 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'learning_rate': np.float64(0.19727005942368125), 'max_depth': 3, 'n_estimators': 64}} | |
| 2 | 2.0 | LR03 | Logistic Regression | 0.862162 | 0.880503 | 0.852995 | 0.895589 | 0.880503 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 24, 'n_jobs': 2}, 'BestParamsGridSCV': {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}} | |
| 3 | 3.0 | XG03 | XGBoost | 0.914865 | 0.874214 | 0.849896 | 0.872260 | 0.874214 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}, 'BestParamsGridSCV': {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 50, 'subsample': 0.8}} | |
| 4 | 4.0 | XG04 | XGBoost | Penalized: AccuracyTrain= 1 | 0.891938 | 0.874214 | 0.855917 | 0.864935 | 0.874214 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}, 'BestParamsRandSCV': {'learning_rate': np.float64(0.29156581270472504), 'max_depth': 4, 'n_estimators': 70, 'subsample': np.float64(0.8852444528883149)}} |
| 5 | 5.0 | AD04 | Adaboost Classifier | 0.871622 | 0.874214 | 0.845258 | 0.882668 | 0.874214 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'learning_rate': np.float64(0.7319987722668247), 'n_estimators': 79}} | |
| 6 | 6.0 | LR04 | Logistic Regression | 0.855405 | 0.874214 | 0.845258 | 0.882668 | 0.874214 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'max_iter': 500, 'random_state': 24}, 'BestParamsRandSCV': {'C': np.float64(0.13292918943162169), 'penalty': 'l1', 'solver': 'liblinear'}} | |
| 7 | 7.5 | AD03 | Adaboost Classifier | 0.870270 | 0.867925 | 0.840028 | 0.864816 | 0.867925 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'learning_rate': 1.0, 'n_estimators': 50}} | |
| 8 | 7.5 | AD02 | Adaboost Classifier | 0.870270 | 0.867925 | 0.840028 | 0.864816 | 0.867925 | {'scheme': 30} | {'modelParams': {'random_state': 42}} | |
| 9 | 9.0 | GB03 | Gradient Boosting | Penalized: AccuracyTrain= 1 | 0.880668 | 0.858491 | 0.840891 | 0.841839 | 0.858491 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'learning_rate': 0.5, 'max_depth': 5, 'n_estimators': 50}} |
| 10 | 10.0 | GB02 | Gradient Boosting | 0.977027 | 0.852201 | 0.824893 | 0.833206 | 0.852201 | {'scheme': 30} | {'modelParams': {'random_state': 42}} | |
| 11 | 11.0 | RF02 | Random Forest | Penalized: AccuracyTrain= 1 | 0.865641 | 0.861635 | 0.820854 | 0.881466 | 0.861635 | {'scheme': 30} | {'modelParams': {'n_jobs': 2, 'random_state': 24}} |
| 12 | 12.0 | RF03 | Random Forest | 0.935135 | 0.855346 | 0.812711 | 0.863049 | 0.855346 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42, 'n_jobs': 2}, 'BestParamsGridSCV': {'max_depth': 20, 'max_features': 0.5, 'min_samples_leaf': 1, 'min_samples_split': 15, 'n_estimators': 500}} | |
| 13 | 13.0 | XG02 | XGBoost | Penalized: AccuracyTrain= 1 | 0.877274 | 0.855346 | 0.836366 | 0.837337 | 0.855346 | {'scheme': 30} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}} |
| 14 | 14.0 | GB01 | Gradient Boosting | 0.971631 | 0.849057 | 0.816596 | 0.838627 | 0.849057 | {'scheme': 20} | {'modelParams': {'random_state': 42}} | |
| 15 | 15.0 | DF04 | DecisionTree Classifier | 0.836486 | 0.858491 | 0.838929 | 0.841739 | 0.858491 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsRandSCV': {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 8, 'min_samples_split': 13}} | |
| 16 | 16.0 | AD01 | Adaboost Classifier | 0.875887 | 0.853774 | 0.834035 | 0.838556 | 0.853774 | {'scheme': 20} | {'modelParams': {'random_state': 42}} | |
| 17 | 17.0 | RF04 | Random Forest | 0.993243 | 0.849057 | 0.800881 | 0.854986 | 0.849057 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'random_state': 42, 'n_jobs': 2}, 'BestParamsRandSCV': {'n_estimators': 50, 'min_samples_split': 2, 'max_depth': 10}} | |
| 18 | 18.0 | RF01 | Random Forest | Penalized: AccuracyTrain= 1 | 0.852208 | 0.849057 | 0.802944 | 0.872618 | 0.849057 | {'scheme': 20} | {'modelParams': {'n_jobs': 2, 'random_state': 24}} |
| 19 | 19.0 | XG01 | XGBoost | Penalized: AccuracyTrain= 1 | 0.862265 | 0.839623 | 0.816353 | 0.818064 | 0.839623 | {'scheme': 20} | {'modelParams': {'random_state': 42, 'use_label_encoder': False, 'eval_metric': 'mlogloss'}} |
| 20 | 20.0 | DF03 | DecisionTree Classifier | 0.964865 | 0.811321 | 0.793812 | 0.784271 | 0.811321 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'random_state': 42}, 'BestParamsGridSCV': {'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 2}} | |
| 21 | 21.0 | LR02 | Logistic Regression | 0.833784 | 0.836478 | 0.770531 | 0.863472 | 0.836478 | {'scheme': 30} | {'modelParams': {'random_state': 24}} | |
| 22 | 22.0 | KN01 | K-Nearest Neighbors | 0.855792 | 0.816038 | 0.774000 | 0.773373 | 0.816038 | {'scheme': 20} | {'modelParams': {}} | |
| 23 | 23.0 | LR01 | Logistic Regression | 0.836879 | 0.820755 | 0.744482 | 0.853036 | 0.820755 | {'scheme': 20} | {'modelParams': {'random_state': 24}} | |
| 24 | 24.0 | KN02 | K-Nearest Neighbors | 0.848649 | 0.814465 | 0.766051 | 0.752787 | 0.814465 | {'scheme': 30} | {'modelParams': {}} | |
| 25 | 25.0 | DF02 | DecisionTree Classifier | Penalized: AccuracyTrain= 1 | 0.837842 | 0.783019 | 0.783790 | 0.784576 | 0.783019 | {'scheme': 30} | {'modelParams': {'random_state': 42}} |
| 26 | 26.0 | MP01 | Multi-Layer Perceptron | 0.822695 | 0.811321 | 0.753680 | 0.751575 | 0.811321 | {'scheme': 20} | {'modelParams': {'max_iter': 300, 'random_state': 42}} | |
| 27 | 28.5 | SV02 | Support Vector Machine | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 | {'scheme': 30} | {'modelParams': {'probability': True}} | |
| 28 | 28.5 | MP04 | Multi-Layer Perceptron | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 | {'scheme': 30, 'randSCV': True} | {'modelParams': {'max_iter': 300, 'random_state': 42}, 'BestParamsRandSCV': {'activation': 'tanh', 'alpha': np.float64(0.006086584841970367), 'hidden_layer_sizes': (30, 30, 30), 'solver': 'sgd'}} | |
| 29 | 28.5 | KN04 | K-Nearest Neighbors | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 | {'scheme': 30, 'randSCV': True} | {'modelParams': {}, 'BestParamsRandSCV': {'metric': 'manhattan', 'n_neighbors': 14, 'weights': 'uniform'}} | |
| 30 | 28.5 | MP03 | Multi-Layer Perceptron | 0.832432 | 0.827044 | 0.748752 | 0.684002 | 0.827044 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {'max_iter': 300, 'random_state': 42}, 'BestParamsGridSCV': {'activation': 'tanh', 'alpha': 0.0001, 'hidden_layer_sizes': (100,), 'solver': 'adam'}} | |
| 31 | 31.0 | KN03 | K-Nearest Neighbors | 0.841892 | 0.823899 | 0.747192 | 0.683551 | 0.823899 | {'scheme': 30, 'gridSCV': True} | {'modelParams': {}, 'BestParamsGridSCV': {'metric': 'manhattan', 'n_neighbors': 9, 'weights': 'uniform'}} | |
| 32 | 32.0 | MP02 | Multi-Layer Perceptron | 0.831081 | 0.817610 | 0.744053 | 0.682640 | 0.817610 | {'scheme': 30} | {'modelParams': {'max_iter': 300, 'random_state': 42}} | |
| 33 | 33.0 | SV01 | Support Vector Machine | 0.834515 | 0.816038 | 0.733374 | 0.665918 | 0.816038 | {'scheme': 20} | {'modelParams': {'probability': True}} | |
| 34 | 34.0 | DF01 | DecisionTree Classifier | Penalized: AccuracyTrain= 1 | 0.812591 | 0.764151 | 0.750122 | 0.739353 | 0.764151 | {'scheme': 20} | {'modelParams': {'random_state': 42}} |
Business Dashboard¶
This business dashboard is created using a platform Google Looker Studio.

The dashboard can be accessed through the following link: 🔗 lookerstudio.google.com/s/grjGn3AvSms
Prediction (optional)¶
In [ ]:
# =========================Import module and setup=========================
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from joblib import load
from prettytable import PrettyTable
# Configure the display of pandas DataFrame to be maximized during debugging
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)
pd.set_option("display.width", 0)
pd.set_option("display.expand_frame_repr", False)
# =========================1. Load Dataset & Model=========================
employee_file = 'saved/employee_attrition_nan.csv'
employee_df = pd.read_csv(employee_file)
model_file = "models/model_GB04.pkl"
loaded_model = load(model_file)
# =========================2. Column Data Type=========================
employee_col_numeric_int = pd.Index([
'Age', 'DailyRate', 'DistanceFromHome', 'HourlyRate',
'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
'PercentSalaryHike', 'TotalWorkingYears', 'TrainingTimesLastYear',
'YearsAtCompany', 'YearsInCurrentRole',
'YearsSinceLastPromotion', 'YearsWithCurrManager'
])
employee_col_categorical = pd.Index([
'BusinessTravel', 'Department', 'Education', 'EducationField',
'EnvironmentSatisfaction', 'Gender', 'JobInvolvement', 'JobLevel',
'JobRole', 'JobSatisfaction', 'MaritalStatus', 'OverTime',
'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel',
'WorkLifeBalance'
])
employee_col_object = pd.Index(['EmployeeId'])
# ======================== 3. Data Type Conversion =========================
employee_df[employee_col_numeric_int] = employee_df[employee_col_numeric_int].astype('int64')
employee_df[employee_col_categorical] = employee_df[employee_col_categorical].astype('category')
employee_df['EmployeeId'] = employee_df['EmployeeId'].astype('object')
# =========================4. Encoding Categorical Column=========================
categorical_cols = employee_df.select_dtypes(include=['object', 'category', 'bool']).columns.tolist()
employee_df_encode = employee_df.copy()
le = LabelEncoder()
for col in categorical_cols:
employee_df_encode[col] = le.fit_transform(employee_df_encode[col])
# =========================5. Predictions & Results=========================
employee_id_list = employee_df_encode['EmployeeId'].tolist()
results = []
for employee_id in employee_id_list:
row_prediksi = employee_df_encode[employee_df_encode['EmployeeId'] == employee_id]
row_prediksi = row_prediksi.drop(columns=['EmployeeId', 'Attrition'])
prediction = loaded_model.predict(row_prediksi)[0]
prediction_label = "Active" if prediction == 0 else "Resigned"
results.append({
'EmployeeId': employee_id,
'Prediction': prediction,
'Status': prediction_label
})
df_predictions = pd.DataFrame(results)
# Save Results to CSV
output_file = os.path.join('saved', employee_file.replace('.csv', '') + '_predictions.csv')
output_file = employee_file.replace('.csv', '') + '_predictions.csv'
df_predictions.to_csv(output_file, index=False)
# ========================= 7. Print Results with PrettyTable =========================
table = PrettyTable()
table.field_names = df_predictions.columns.tolist()
for _, row in df_predictions.iterrows():
table.add_row(row.tolist())
print(table)
+------------+------------+----------+ | EmployeeId | Prediction | Status | +------------+------------+----------+ | 0 | 0 | Active | | 1 | 0 | Active | | 2 | 0 | Active | | 3 | 0 | Active | | 4 | 0 | Active | | 5 | 0 | Active | | 6 | 0 | Active | | 7 | 0 | Active | | 8 | 0 | Active | | 9 | 0 | Active | | 10 | 0 | Active | | 11 | 0 | Active | | 12 | 0 | Active | | 13 | 0 | Active | | 14 | 0 | Active | | 15 | 0 | Active | | 16 | 0 | Active | | 17 | 0 | Active | | 18 | 0 | Active | | 19 | 0 | Active | | 20 | 0 | Active | | 21 | 0 | Active | | 22 | 0 | Active | | 23 | 0 | Active | | 24 | 0 | Active | | 25 | 0 | Active | | 26 | 0 | Active | | 27 | 0 | Active | | 28 | 0 | Active | | 29 | 1 | Resigned | | 30 | 0 | Active | | 31 | 1 | Resigned | | 32 | 0 | Active | | 33 | 0 | Active | | 34 | 0 | Active | | 35 | 0 | Active | | 36 | 0 | Active | | 37 | 0 | Active | | 38 | 1 | Resigned | | 39 | 0 | Active | | 40 | 0 | Active | | 41 | 0 | Active | | 42 | 0 | Active | | 43 | 0 | Active | | 44 | 0 | Active | | 45 | 0 | Active | | 46 | 0 | Active | | 47 | 0 | Active | | 48 | 0 | Active | | 49 | 0 | Active | | 50 | 0 | Active | | 51 | 0 | Active | | 52 | 0 | Active | | 53 | 0 | Active | | 54 | 0 | Active | | 55 | 0 | Active | | 56 | 0 | Active | | 57 | 0 | Active | | 58 | 0 | Active | | 59 | 1 | Resigned | | 60 | 0 | Active | | 61 | 0 | Active | | 62 | 0 | Active | | 63 | 0 | Active | | 64 | 0 | Active | | 65 | 0 | Active | | 66 | 0 | Active | | 67 | 0 | Active | | 68 | 0 | Active | | 69 | 0 | Active | | 70 | 1 | Resigned | | 71 | 0 | Active | | 72 | 0 | Active | | 73 | 0 | Active | | 74 | 0 | Active | | 75 | 0 | Active | | 76 | 0 | Active | | 77 | 0 | Active | | 78 | 0 | Active | | 79 | 0 | Active | | 80 | 0 | Active | | 81 | 0 | Active | | 82 | 0 | Active | | 83 | 0 | Active | | 84 | 0 | Active | | 85 | 0 | Active | | 86 | 0 | Active | | 87 | 0 | Active | | 88 | 1 | Resigned | | 89 | 0 | Active | | 90 | 0 | Active | | 91 | 0 | Active | | 92 | 0 | Active | | 93 | 0 | Active | | 94 | 0 | Active | | 95 | 0 | Active | | 96 | 0 | Active | | 97 | 0 | Active | | 98 | 0 | Active | | 99 | 0 | Active | | 100 | 0 | Active | | 101 | 0 | Active | | 102 | 0 | Active | | 103 | 0 | Active | | 104 | 0 | Active | | 105 | 0 | Active | | 106 | 0 | Active | | 107 | 0 | Active | | 108 | 0 | Active | | 109 | 0 | Active | | 110 | 0 | Active | | 111 | 0 | Active | | 112 | 0 | Active | | 113 | 0 | Active | | 114 | 0 | Active | | 115 | 0 | Active | | 116 | 0 | Active | | 117 | 0 | Active | | 118 | 0 | Active | | 119 | 0 | Active | | 120 | 0 | Active | | 121 | 1 | Resigned | | 122 | 0 | Active | | 123 | 0 | Active | | 124 | 0 | Active | | 125 | 0 | Active | | 126 | 0 | Active | | 127 | 0 | Active | | 128 | 0 | Active | | 129 | 0 | Active | | 130 | 0 | Active | | 131 | 0 | Active | | 132 | 0 | Active | | 133 | 0 | Active | | 134 | 1 | Resigned | | 135 | 0 | Active | | 136 | 1 | Resigned | | 137 | 0 | Active | | 138 | 0 | Active | | 139 | 0 | Active | | 140 | 0 | Active | | 141 | 0 | Active | | 142 | 0 | Active | | 143 | 0 | Active | | 144 | 0 | Active | | 145 | 0 | Active | | 146 | 0 | Active | | 147 | 0 | Active | | 148 | 1 | Resigned | | 149 | 0 | Active | | 150 | 0 | Active | | 151 | 0 | Active | | 152 | 0 | Active | | 153 | 0 | Active | | 154 | 0 | Active | | 155 | 0 | Active | | 156 | 0 | Active | | 157 | 0 | Active | | 158 | 0 | Active | | 159 | 0 | Active | | 160 | 0 | Active | | 161 | 0 | Active | | 162 | 0 | Active | | 163 | 0 | Active | | 164 | 0 | Active | | 165 | 1 | Resigned | | 166 | 0 | Active | | 167 | 1 | Resigned | | 168 | 0 | Active | | 169 | 0 | Active | | 170 | 0 | Active | | 171 | 0 | Active | | 172 | 0 | Active | | 173 | 0 | Active | | 174 | 0 | Active | | 175 | 0 | Active | | 176 | 0 | Active | | 177 | 0 | Active | | 178 | 0 | Active | | 179 | 0 | Active | | 180 | 1 | Resigned | | 181 | 0 | Active | | 182 | 0 | Active | | 183 | 0 | Active | | 184 | 0 | Active | | 185 | 0 | Active | | 186 | 0 | Active | | 187 | 0 | Active | | 188 | 0 | Active | | 189 | 0 | Active | | 190 | 1 | Resigned | | 191 | 0 | Active | | 192 | 0 | Active | | 193 | 0 | Active | | 194 | 0 | Active | | 195 | 0 | Active | | 196 | 1 | Resigned | | 197 | 0 | Active | | 198 | 0 | Active | | 199 | 0 | Active | | 200 | 0 | Active | | 201 | 1 | Resigned | | 202 | 0 | Active | | 203 | 0 | Active | | 204 | 0 | Active | | 205 | 0 | Active | | 206 | 0 | Active | | 207 | 1 | Resigned | | 208 | 0 | Active | | 209 | 0 | Active | | 210 | 0 | Active | | 211 | 0 | Active | | 212 | 0 | Active | | 213 | 0 | Active | | 214 | 0 | Active | | 215 | 0 | Active | | 216 | 0 | Active | | 217 | 0 | Active | | 218 | 0 | Active | | 219 | 1 | Resigned | | 220 | 0 | Active | | 221 | 0 | Active | | 222 | 0 | Active | | 223 | 0 | Active | | 224 | 0 | Active | | 225 | 0 | Active | | 226 | 0 | Active | | 227 | 0 | Active | | 228 | 0 | Active | | 229 | 1 | Resigned | | 230 | 0 | Active | | 231 | 1 | Resigned | | 232 | 0 | Active | | 233 | 0 | Active | | 234 | 1 | Resigned | | 235 | 0 | Active | | 236 | 0 | Active | | 237 | 0 | Active | | 238 | 0 | Active | | 239 | 0 | Active | | 240 | 0 | Active | | 241 | 0 | Active | | 242 | 0 | Active | | 243 | 0 | Active | | 244 | 0 | Active | | 245 | 0 | Active | | 246 | 0 | Active | | 247 | 0 | Active | | 248 | 0 | Active | | 249 | 0 | Active | | 250 | 1 | Resigned | | 251 | 0 | Active | | 252 | 0 | Active | | 253 | 0 | Active | | 254 | 0 | Active | | 255 | 0 | Active | | 256 | 1 | Resigned | | 257 | 0 | Active | | 258 | 0 | Active | | 259 | 0 | Active | | 260 | 1 | Resigned | | 261 | 0 | Active | | 262 | 0 | Active | | 263 | 0 | Active | | 264 | 0 | Active | | 265 | 0 | Active | | 266 | 0 | Active | | 267 | 1 | Resigned | | 268 | 0 | Active | | 269 | 0 | Active | | 270 | 0 | Active | | 271 | 0 | Active | | 272 | 0 | Active | | 273 | 0 | Active | | 274 | 0 | Active | | 275 | 0 | Active | | 276 | 0 | Active | | 277 | 0 | Active | | 278 | 0 | Active | | 279 | 0 | Active | | 280 | 0 | Active | | 281 | 0 | Active | | 282 | 0 | Active | | 283 | 0 | Active | | 284 | 0 | Active | | 285 | 0 | Active | | 286 | 0 | Active | | 287 | 0 | Active | | 288 | 0 | Active | | 289 | 0 | Active | | 290 | 0 | Active | | 291 | 0 | Active | | 292 | 0 | Active | | 293 | 0 | Active | | 294 | 0 | Active | | 295 | 0 | Active | | 296 | 0 | Active | | 297 | 0 | Active | | 298 | 0 | Active | | 299 | 0 | Active | | 300 | 0 | Active | | 301 | 0 | Active | | 302 | 0 | Active | | 303 | 0 | Active | | 304 | 1 | Resigned | | 305 | 0 | Active | | 306 | 0 | Active | | 307 | 0 | Active | | 308 | 0 | Active | | 309 | 1 | Resigned | | 310 | 0 | Active | | 311 | 0 | Active | | 312 | 0 | Active | | 313 | 0 | Active | | 314 | 0 | Active | | 315 | 0 | Active | | 316 | 0 | Active | | 317 | 0 | Active | | 318 | 1 | Resigned | | 319 | 0 | Active | | 320 | 0 | Active | | 321 | 0 | Active | | 322 | 0 | Active | | 323 | 0 | Active | | 324 | 0 | Active | | 325 | 0 | Active | | 326 | 0 | Active | | 327 | 0 | Active | | 328 | 0 | Active | | 329 | 0 | Active | | 330 | 0 | Active | | 331 | 0 | Active | | 332 | 1 | Resigned | | 333 | 0 | Active | | 334 | 0 | Active | | 335 | 0 | Active | | 336 | 0 | Active | | 337 | 0 | Active | | 338 | 1 | Resigned | | 339 | 0 | Active | | 340 | 0 | Active | | 341 | 0 | Active | | 342 | 0 | Active | | 343 | 1 | Resigned | | 344 | 0 | Active | | 345 | 0 | Active | | 346 | 0 | Active | | 347 | 0 | Active | | 348 | 0 | Active | | 349 | 0 | Active | | 350 | 0 | Active | | 351 | 1 | Resigned | | 352 | 0 | Active | | 353 | 1 | Resigned | | 354 | 1 | Resigned | | 355 | 0 | Active | | 356 | 0 | Active | | 357 | 0 | Active | | 358 | 0 | Active | | 359 | 0 | Active | | 360 | 0 | Active | | 361 | 1 | Resigned | | 362 | 0 | Active | | 363 | 0 | Active | | 364 | 0 | Active | | 365 | 0 | Active | | 366 | 0 | Active | | 367 | 0 | Active | | 368 | 0 | Active | | 369 | 0 | Active | | 370 | 0 | Active | | 371 | 1 | Resigned | | 372 | 0 | Active | | 373 | 0 | Active | | 374 | 0 | Active | | 375 | 0 | Active | | 376 | 0 | Active | | 377 | 1 | Resigned | | 378 | 0 | Active | | 379 | 0 | Active | | 380 | 0 | Active | | 381 | 0 | Active | | 382 | 0 | Active | | 383 | 1 | Resigned | | 384 | 0 | Active | | 385 | 0 | Active | | 386 | 0 | Active | | 387 | 0 | Active | | 388 | 0 | Active | | 389 | 0 | Active | | 390 | 0 | Active | | 391 | 0 | Active | | 392 | 0 | Active | | 393 | 0 | Active | | 394 | 0 | Active | | 395 | 0 | Active | | 396 | 0 | Active | | 397 | 0 | Active | | 398 | 0 | Active | | 399 | 0 | Active | | 400 | 0 | Active | | 401 | 0 | Active | | 402 | 0 | Active | | 403 | 0 | Active | | 404 | 0 | Active | | 405 | 0 | Active | | 406 | 0 | Active | | 407 | 0 | Active | | 408 | 0 | Active | | 409 | 0 | Active | | 410 | 0 | Active | | 411 | 0 | Active | +------------+------------+----------+